A Program for Aligning Sentences in Bilingual Corpora

نویسندگان

  • William A. Gale
  • Kenneth Ward Church
چکیده

Researchers in both machine Iranslation (e.g., Brown et al., 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts, texts such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (French and English). This paper describes a method for aligning sentences in these parallel texts, based on a simple statistical model of character lengths. The method was developed and tested on a small trilingual sample of Swiss economic reports. A much larger sample of 90 million words of Canadian Hansards has been aligned and donated to the ACL/DCI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

- 1 - A Program for Aligning Sentences in Bilingual Corpora

Researchers in both machine translation (e.g., Brown et al., 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying bilingual corpora, bodies of text such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (such as French and English). One useful step is to align the sentences, that is, to id...

متن کامل

Aligning Sentences in Bilingual Corpora using Lexical Information

In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown el al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the pro...

متن کامل

Bilingual Lexicon Construction Using Large Corpora

This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building bilingual lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of ltered linguistic feedback between term ...

متن کامل

An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information

In this paper we describe an algorithm for aligning sentences with their translations in a bilingual corpus using lexical information of the languages. Existing efficient algorithms ignore word identities and consider only the sentence lengths (Brown, 1991; Gale and Church, 1993). For a sentence in the source language text, the proposed algorithm picks the most likely translation from the targe...

متن کامل

Second Language Acquisition from Aligned Corpora

The paper describes a system for automatic aligning and searching for translation equivalents in large bilingual corpora. This implementation was developed to facilitate our tasks in GLOSSER #343 Coper-nicus'94 Joint Research Project, where Linguistic Modeling Laboratory was charged especially with preparation of bilingual material. The Gale-Church algorithm is chosen as aligning procedure for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1991